Overview

Dataset Statistics

Number of Variables 8
Number of Rows 99224
Missing Cells 145903
Missing Cells (%) 18.4%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 44.5 MB
Average Row Size in Memory 470.1 B
Variable Types
  • Numerical: 1
  • Categorical: 7

Dataset Insights

index is uniformly distributed Uniform
review_comment_title has 87656 (88.34%) missing values Missing
review_comment_message has 58247 (58.7%) missing values Missing
review_id has a high cardinality: 98410 distinct values High Cardinality
order_id has a high cardinality: 98673 distinct values High Cardinality
review_comment_title has a high cardinality: 4527 distinct values High Cardinality
review_comment_message has a high cardinality: 36159 distinct values High Cardinality
review_creation_date has a high cardinality: 636 distinct values High Cardinality
review_answer_timestamp has a high cardinality: 98248 distinct values High Cardinality
review_id has constant length 32 Constant Length
order_id has constant length 32 Constant Length
review_score has constant length 1 Constant Length
review_creation_date has constant length 19 Constant Length
review_answer_timestamp has constant length 19 Constant Length
  • 1
  • 2

Variables


index

numerical

Approximate Distinct Count 99224
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 1587584
Mean 49611.5
Minimum 0
Maximum 99223
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 4961.15
Q1 24805.75
Median 49611.5
Q3 74417.25
95-th Percentile 94261.85
Maximum 99223
Range 99223
IQR 49611.5

Descriptive Statistics

Mean 49611.5
Standard Deviation 28643.6459
Variance 8.2046e+08
Sum 4.9227e+09
Skewness 0
Kurtosis -1.2
Coefficient of Variation 0.5774
  • index is not normally distributed (p-value 8.530609293743981e-198)

review_id

categorical

Approximate Distinct Count 98410
Approximate Unique (%) 99.2%
Missing 0
Missing (%) 0.0%
Memory Size 9624728

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 7bc2406110b926393a...
2nd row 80e641a11e56f04c1a...
3rd row 228ce5500dc1d8e020...
4th row e64fb393e7b32834bb...
5th row f7c4243c7fe1938f18...

Letter

Count 1191118
Lowercase Letter 1191118
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1984050
  • review_id contains many words: 98410 words
  • review_id has words of constant length

order_id

categorical

Approximate Distinct Count 98673
Approximate Unique (%) 99.4%
Missing 0
Missing (%) 0.0%
Memory Size 9624728

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row 73fc7af87114b39712...
2nd row a548910a1c6147796b...
3rd row f9e4b658b201a9f2ec...
4th row 658677c97b385a9be1...
5th row 8e6bfb81e283fa7e4f...

Letter

Count 1190495
Lowercase Letter 1190495
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 1984673
  • order_id contains many words: 98673 words
  • order_id has words of constant length

review_score

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 6548784
  • The largest value (5) is over 2.99 times larger than the second largest value (4)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 4
2nd row 5
3rd row 5
4th row 5
5th row 5

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 99224
  • The top 2 categories (5, 4) take over 50.0%
  • The largest value (5) is over 2.99 times larger than the second largest value (4)
  • review_score has words of constant length

review_comment_title

categorical

Approximate Distinct Count 4527
Approximate Unique (%) 39.1%
Missing 87656
Missing (%) 88.3%
Memory Size 996511

Length

Mean 11.945
Standard Deviation 6.3286
Median 10
Minimum 1
Maximum 26

Sample

1st row recomendo
2nd row Super recomendo
3rd row Não chegou meu pro...
4th row Ótimo
5th row Muito bom.

Letter

Count 119690
Lowercase Letter 105535
Space Separator 12092
Uppercase Letter 14155
Dash Punctuation 18
Decimal Number 1065
  • review_comment_title contains many words: 2082 words
  • The largest value (recomendo) is over 1.57 times larger than the second largest value (bom)

review_comment_message

categorical

Approximate Distinct Count 36159
Approximate Unique (%) 88.2%
Missing 58247
Missing (%) 58.7%
Memory Size 8237897

Length

Mean 68.6377
Standard Deviation 53.8492
Median 53
Minimum 1
Maximum 208

Sample

1st row Recebi bem antes d...
2nd row Parabéns lojas lan...
3rd row aparelho eficiente...
4th row Mas um pouco ,trav...
5th row Vendedor confiável...

Letter

Count 2213098
Lowercase Letter 2086712
Space Separator 443243
Uppercase Letter 126386
Dash Punctuation 716
Decimal Number 14418
  • review_comment_message contains many words: 20814 words

review_creation_date

categorical

Approximate Distinct Count 636
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Memory Size 8334816

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2018-01-18 00:00:0...
2nd row 2018-03-10 00:00:0...
3rd row 2018-02-17 00:00:0...
4th row 2017-04-21 00:00:0...
5th row 2018-03-01 00:00:0...

Letter

Count 0
Lowercase Letter 0
Space Separator 99224
Uppercase Letter 0
Dash Punctuation 198448
Decimal Number 1389136
  • The largest value (000000) is over 214.12 times larger than the second largest value (20171219)
  • review_creation_date has words of constant length

review_answer_timestamp

categorical

Approximate Distinct Count 98248
Approximate Unique (%) 99.0%
Missing 0
Missing (%) 0.0%
Memory Size 8334816

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row 2018-01-18 21:46:5...
2nd row 2018-03-11 03:05:1...
3rd row 2018-02-18 14:36:2...
4th row 2017-04-21 22:02:0...
5th row 2018-03-02 10:26:5...

Letter

Count 0
Lowercase Letter 0
Space Separator 99224
Uppercase Letter 0
Dash Punctuation 198448
Decimal Number 1389136
  • review_answer_timestamp contains many words: 53993 words
  • review_answer_timestamp has words of constant length

Interactions

Correlations

Missing Values